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Abstract. We define a free probability analogue of the Wasserstein metric, 
which extends the classical one. In dimension one, we prove that the square of 
the Wasserstein distance to the semi-circle distribution is majorized by a modified 
free entropy quantity. 

Introduction 

The Wasserstein distance between two probability distributions fi, v on M. n is given by 



W(n, v) = inf (/ \x — y\ dii{x,y)) 2 
w&n{fi,u) J 

where denotes the probability measures on W 1 x W l with marginals fi and v. 

Following the usual free probability recipe we shall replace the set of probability measures 
by the trace-state space of a C* -algebra and take marginals with respect to a free product. 
In this note we begin the study of the ensuing free Wasserstein metric. 

An inequality of M. Talagrand ([7], [10]) relates the Wasserstein distance from a Gaussian 
distribution and relative entropy. In the one- variable case we prove a related free probability 
result to this inequality, where the semicircle law replaces the Gauss law and the logarithmic 
energy plays the role of entropy. Note that in the case of n-tuples of commuting selfadjoint 
variables the classical and the free Wasserstein distances are equal. 

In the context of non-commutative geometry, there is a different noncommutative ex- 
tension, due to A. Connes [5], of the related Monge-Kantorowitz metric. The Monge- 
Kantorowitz metric is a p = l, p-Wasserstein metric, but the definition which is extended 
is the dual definition based on Lipschitz functions, and the extension involves Fredholm- 
modules or derivations (recent work is surveyed in [9]). 
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1 The free Wasserstein metric 



1.1 The distance on n-tuples of variables 

We will work in the framework of tracial C* -probability spaces (M , r) , where M is a unital 
C* -algebra and r is a trace state. The simplest is to define the metric at the level of 
noncommutative random variables. If (Xi, . . . , X n ) and (Y"i, . . . , Y n ) are two n-tuples of 
noncommutative random variables in tracial C* -probability spaces (M l5 Ti) and (M 2 ,r 2 ), 
we define 

W p ((X 1 ,...,X n ),(Y 1 ,...,Y n )) 

as the infimum of 

\\(\Xj - Y/|p)l<j< n ||p 

over 2n-tuples {X[, . . . , X' n , Y{, . . . , Y^) of noncommutative random variables in some tracial 
C* -probability space (M 3 , T3) such that the n-tuples (X[, . . . , X' n ), (X±, . . . , X n ) and respec- 
tively (y/, . . . , Y^), (Yi, . . . , 1^) have the same * -distributions. Here | • \ p is the p-norm in a 
tracial C* -probability space, while || ■ || p is the p-norm on M. n . Like in the classical case, if 
p = 2 we call W p the free Wasserstein metric and we will also use the notation W for W2 ■ 
We shall refer to W p as the free p- Wasserstein metric. Note also that if 

X j = D j + iE j , )) I) ■ !(.[, 

where Dj, Ej, Fj,Gj are self-adjoint, then 

W((X 1 ,...,X n ),(Y 1 ,...,Y n ))^W((D 1 ,...,D n ,E 1 ,...,E n ),(F 1 ,...,F n ,G 1 ,...,G n )) 

Note also that W p ((Xi, . . . , X n ), (Yi, . . . , Y n )) depends only on the * -distributions of (X 1: . . . , X n ) 
and (Y"i, . . . ,Y n ). If we consider n-tuples with the same * -distribution as equivalent; then 
W p will be a distance between equivalence classes of n-tuples. 

1.2 The distance on trace states 

We pass now to trace-state spaces TS(A), where A is a unital C*-algebra. We will assume 
A is finitely generated and we will assume such a generator (a 1; . . . , a n ) has been specified. 
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The p-Wasserstein metric on TS(A) is given by 

W p (r',r") = W p ((a' 1 ,...,a' n ),(a';,...,a:)) 

where t',t" E TS(A) and (a[, . . . ,a' n ) and (a", . . . , aJQ denote the variables defined by 
(oi, . . . , a n ) in (A, t') and respectively (A, t") . 

This definition can be rephrased using free products. If A\,A 2 are unital C* -algebras, 
we denote by <jj : Aj ^ A 1 * A 2 the canonical injection of Aj into the full free product 
C*-algebra (this presumes amalgamation over CI). If tj E TS(Aj), (1 < j < 2) we define 

TS(A 1 *A 2 ;t 1 ,t 2 ) = {teTS(A 1 *A 2 ) \ro ( T j = T j , j = l,2} . 

Remark that T\ * r 2 E TS(A 1 * A 2 ; t±, t 2 ) . 
It is easy to see that 

W p (r',r") = infflKMa,-) - ^KOU^JI, I r E TS(A* A;r',r")} 

where | • | PjT denotes the p-norm in L p (A;t). 

Remark also that the distance on n-tuples of variables can be obtained from the def- 
inition for trace-states. Assume for simplicity Xj = X* , Yj = Y* and R > 
R > ll^jll) 1 < j < n - Let then A = (C[— R, R])* n (the free product of n copies) and 
o'k(a-) = dk , where a is the identical function in C[—R, R] . Let pj : A — > Mj , j — 1,2 be the 
*-homomorphisms such that pi(a^) = Xk, p 2 (a,k) = Yk where the Xfc's are in (Mi,Ti) and 
the y fc 's in (M 2 ,r 2 ). Then 

W p (r', t") = W P {{X U ...,X n ),(Y 1 ,...,Y n )) 

where r' = T\ o p\ , r" = t 2 o p 2 . 

1.3 Theorem. W p is a metric. 

Proof. To check that W p is a metric on the set of equivalence classes of n-tuples of vari- 
ables or equivalently on a trace-state space ST (A) like in 1.2, the nontrivial assertion is the 
triangle inequality. Indeed that W P {{X U . . . , X n ), (Yi, . . . , Y n )) = 44> (X u . . . , X n ), (Yi, . . . , Y n ) 
have the same * -distribution or 

W p (t',t") = 0&t' = t" 
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are easy to see. For the triangle inequality it will suffice to prove it in the context of 1.1. 

Let (X[,..., X' n , Y{, ...,Y;) in (M 12 , r 12 ) and (Y", Y», Z», Z" n ) in (M 23 , r 23 ) be 
2n-tuples in tracial W* -probability spaces such that (X[, . . . , X' n ) ~ (Xi,...,X n ), 
(Y{,...X) ~ (Y»,...,Y») ~ (yi,...,y„), (^,...,^) ~ (Zi,...,Z n ), where ~ means 
the n-tuples have equal * -distribution. There is a trace-preserving automorphism of 
W*(Y{, ...,Y£) and . . . , Y'') which identifies Y/ and Y" . Abusing notations we shall 

denote by M 2 the von Neumann subalgebras of M 12 and M 23 generated by (Y(, . . . , Y£) and 
respectively (F", . . . , Y") identified as above. Let E' and E" be the conditional expectations 
of M12 and respectively M 23 onto M 2 . 

Let (M 123 ,E) = (M 12 ,E') * Ma {M 23 ,E") and ri 23 = r 2 o E where r 2 = r l2 \M 2 = r 23 \M 2 
(see 3.8 in [14]). Further, with p 12 : M 12 — * M 123 , p 23 : M 23 — > M 123 denoting the canonical 
embeddings, let Xf = p i2 (Xj), = Z?. Then p 12 (F/) = p 23 {Yf ) implies 

I V'" _ 7"'\ <r I V'" _ n (V'^l -I- In fV"^ — 7"'\ - \ Y' — V'\ -4- W" — 7"\ 

I j j IP> T 123 — l^-j H12K 1 j ) |p,T123 ^ Ir23l- f j J ■"j |p,T123 l^j 1 j lp,T12 ~T I - 1 j lp,T23 

which is precisely what we need to establish the triangle inequality 

W p ((X 1 ,...,X n ),(Y 1 ,...,Y n )) + W p ((Y 1 ,...,Y n ),(Z 1 ,...,Z n )) > W P {{X U . . . , X n ), (Z u . . . , Z n )) . 

□ 

Let us also record as a proposition some easy consequences of the compacity of the 
trace-state space. The proof is left to the reader. 

1.4 Proposition, (a) The infimum in the definition of W p is attained (both in the 1.1 and 
1.2 contexts). 

(b) Let Ti k \ri,T 2 k \r 2 G TS(A) and assume converges weakly to Tj as k — > oo 
(j = 1,2). Then 

liminf W p {r[ k \4 k) ) > W p {t u t 2 ) . 

k— >oo 

(c) Let (X?\...,xj i k) ),(X 1 ,...,X n ), (yl k \...M k) ),{yi,---,Y n ) be n-tuples of 

(k) 

variables in tracial C* -probability spaces and assume that \\Xf'\\ < R, \\XjW < R, 
\\Yj k) || < R, \\Yj\\ < R, and that (x[ k \ X {k) ) , (Y} k) , . . . , Y^ k) ) converge in * -distribution 
to (Xi, . . . , X n ) and respectively (Y±, . . . , Y n ) . Then 

liminf W p ((x[ k \ . . . , *(*>), (Y}"\ . . . , F«)) > W P ((X U . . .,X n ), (Y u Y n )) . 

k^oo 
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If (Xi, . . . ,X n ) are commuting self-adjoint variables in a tracial C* -probability space, 
then their distribution fJ,x lt ...,x n is a compactly supported probability measure on R n . 

1.5 Theorem. Let (X l5 . . . , X n ) and (Y]_, . . . , Y n ) be two n-tuples of commuting self-adjoint 
variables in tracial C* -probability spaces. Then the free and classical Wasserstein distances 
are equal: 

W((X 1 ,...,X n ),(Y 1 ,...,Y n )) = 1H//.Y, v.,//v; vj ■ 

Proof. The left-hand side is < the right-hand side, since the classical Wasserstein 
distance can be defined the same way as the free one, with the only difference that the 2n- 
tuples (X[, . . . , X' n , Y{, . . . , Y^) in the infimum are required to live in commutative tracial 
C* -probability spaces. We therefore only need to prove > . 

Let (X[, . . . , X' n , Y{, . . . , Y£) be a 2n-tuple in the infimum defining the free distance. 
Passing to the von Neumann algebra completion, we may assume (M 3 , T3) , where Xj, Y- live, 
is a W* -probability space with a normal faithful trace state. Let A = W*(X[, . . . , X' n ) c M 3 , 
B = W*(Y{, . . . , Y£) C M 3 and let Ea be the canonical conditional expectation onto A. 
Then the unital trace-preserving completely positive map ip = E^\B : B — > A gives rise to 
a state v : A® B ^ C, on a commutative algebra, defined by 

v[a ® b) — T 3 (a(p(b)) . 

The posit ivity of v, 

TsiJ^aia^ikb*)) >0 , 

is easily inferred from the positivity of the matrix (ip(bib*))ij . Alternatively, probabilistically, 
v is the probability measure on IR 2n obtained by integrating w.r.t. fix 1 ,...,x n the kernel of 
probability measures describing 

V? : L°°(R n , /Xy 1 ...y n ) - L°°(W, pt Xl ...X n ) ■ 
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Then 

E "(PS ~ Y tf ) = E ^K 2 + vtf 2 ) - x ' 3 ■ <pW) - YfriXD) 
= E t 3 (x> + y;*-2e a (x>y>)) 

l<i<" 

= E r^-y-f)- 

l<j<n 

Since A <g> is commutative this proves the theorem. □ 



2 Cost of transportation to the semicircle distribution 

2.1 The complex quasilinear differential equation 

Let X, S in (M, r) be self-adjoint and freely independent and assume S is (0,1) semicircular. 
The purpose of section 2 is to estimate W(X,S). We begin by studying variables X(t) = 
Q-t/'iX + (1 — e -t )aS which have the same distribution as the variables in the free Ornstein- 
Uhlenbeck process. For technical reasons, and without extra work, the complex PDE will 
be derived under the more general assumption that X is unbounded self-adjoint affiliated 
with M (see [1]). 

If Y is self-adjoint affiliated with M, we denote by fix its distribution and by G^ y (z) 
or Gy(z) the Cauchy transform of fiy , which equals t((zI — X)^ 1 ). 

If Y(r) = X + AS, let G(r,z) = G Y{r) (z) and G(t,z) = G x{t) {z), Im z > 0, r > 0, 
t > 0. Then G satisfies the complex Burgers equation (see [3], [12]) 

^ + G^ = 0. 

<9r dz 

Like G(t, z) also G(t, z) is C 1 on [0, oo) x {z e C | Imz > 0} and holomorphic in z for 
fixed t. Note that = e' t l 2 Y{e t ) and that G a y(z) = a' l G(a~ l z). It follows that 

G(t, z) = e*/ 2 G(e*, e*/ 2 z) . The complex Burgers equation then gives 



with initial data G(0, z) = Gx(z). 
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2.2 The transport equation 



Here we shall assume that the distribution of X is of the form P\*fi where P\ is the Cauchy 
distribution with density 7r _1 A(A 2 + x 2 )^ 1 (A > 0) and /i has compact support. Since 
P\* fi = P\5i n ([1]) this is equivalent to replacing X with X + AC where X is bounded, X 
and C are free and C has a Cauchy distribution Pi . Note that fi 
G 



G x+\c+As^ z 



X+r^S 



l =11 l * P\ , 

z + iA) , etc. Thus, if the distribution of X is of the form P\* /i 



then the equation (1) is satisfied on an extended domain 

{(t, 2)G[0,oo)xC|Im2> -e t/2 X} . 

Let — 7r _1 G(x,t) = q(x,t) + ip(x,t) where x G R. Then p(- ,t) is the density of Hx(t) 
and is analytic. For fixed t and k > we have 



Qk 



dx k 



p(x,t) 



0((l + \x\)~ 2 - k ) and 



dx k 



q(x,t) 



0((l + \x\)- L - k ) 



Moreover these bounds are uniform for t in a compact set. 
Equation (1) gives 

qt = n(qq x - pp x ) + 2~\xq x + q) 
Pt = n(pq x + qp x ) + T l {xp x + p) 
q = -Hp 

where H denotes the Hilbert transform. 
Since p(x, t) > we infer that 



(2) 



f(a,t) 



p(x, i)dx 



is a C°°-diffeomorphisms f(-t) : R — > (0,1) which transports fJ>x(t) to Lebesgue measure. 
Hence <p Stt {-) = / _1 (/(-s),t) (0 < s < t) will be a C°°-diffeomorphism R -> R, which 
transports fJ,x(s) to fix(t)- This is the same as saying that X(t) and <^ S;t (X(s)) have the 
same distribution. 

It is easily seen that 

-(S/x/- 1 ^*),*) 



Pif-Hy,t),t) 
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Using (2) to compute J^/ we find 



d f a 

^/(a,t) = y (n(pq) x + 2~ 1 (xp) x )dx = n(pq)(a, t) + 2~ 1 ap(a, t) . 



Hence 



§- t f-\y,t) = -n q tf- 1 (y,t),t)-2-\f-\y,t) . 
For y = f(x, s) we get the transport equation 

^PsM = ir(Hp(-,t))(i Ps , t (x))-2- 1 <p s , t (x)) (3) 

with initial condition tp SjS (x) = x. 

By the L m -continuity (1 < m < oo) results for the density (see Corollary 2 in [ ]) 
applied to [i EH [i i as a function of r , we infer after convolutions with Cauchy distributions 
the continuity of 

(0,oo) 3 t — ► Hp(-,t) G L n 



(the L m -space w.r.t. Lebesgue measure). The reader should keep these facts in mind in 
computations where we shall use (3). 

Lemma 2.3 Assume X has distribution fi * P\, where fi has compact support and let 
X(t) = e~'/ 2 X + (1 + e - ')^ with S (0, 1) -semicircular and free from X . Let g E C c 
be such that ||<?||oo < oo, H^'Uoo < 1 and assume g' has compact support. Then 

(t- s) 2 W(g(X(s)),g(X(t))) 2 < sup / (nHp(- , h)(x) - 2' 1 x) 2 p(x, h)dx . 

s<h<t Jsupp g' 
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Proof. We have 



W(g(X(s)),g(X(t))) 2 

< / \g(x)-g((p 3 ,t(x))\ 2 p(x,s)dx 

< J (J^g\Vs,h{x)){nHp(-,h){ip sh {x))-2- l y s , h (x))dh\ p(x,s)dx 

< (ts) / / (g\(Ps,h(x))) 2 (7rHp(- ,h)(ip sh (x)) -2- 1 (p S)h (x)) 2 dh p(x,s)dx 

= (ts) jf ^Jj,g'{^ h {x))) 2 {nHp(- , - 2- 1 <p a>h (x)) 2 dh p(x, s)dx^j dh 

= (t-s) [ [ (g'(x)) 2 {nHp{- , h)(x) - 2~ 1 x) 2 p(x, h)dxdh 



<{t- s) 2 sup / (7rHp(- , h))(x) - 2~ 1 x) 2 p(x, h)dx . 

s<h<t Jsupp g' 



□ 



2.4. Assume X is bounded and the semicircular variable S is free w.r.t. X . Then the 
distribution Hx{t) of X{t) = e~ l l 2 X + (1 — e l )^S has L°°-density p(- ,t) w.r.t. Lebesgue 
measure (see any of the papers [1], [2], [3], [11], [12]). 

Lemma. Assume X is bounded, S is (0, 1) semicircular, X and S are free and let 
p(- ,t) be the density of Hx(t)> where X(t) = e~ l l 2 X + (1 — e*)^S\ Then 

(t - sy 2 W{X{s), X{t)) 2 < sup / (ttH P (- , h)(x) - 2- 1 x) 2 p(x, h)dx . 

s<h<t J 

Proof. Let C be a variable with Cauchy distribution and free w.r.t {X, S} . Let g G 
C°°(M) be such that Wg'W^ < 1 , g{x) = x if \x\ < \\X\\ + 1 and g'(x) = if \x\>\\X\\+2. 
We shall apply Lemma 2.3 to X + AC in place of X . Let 

Z(t, A) = e~ t/2 (X + AC) + (1 - e"*)^ = X(t) + e- t/2 AC . 

Then g(Z(t,\)) is an operator of norm < ||X|| + 2 and converges in distribution to X(t). 
Moreover the distribution of Z(t,\) is given by the density P e -t/2 X * p(- ,t) and will be 
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denoted by p{- ,t,X). In view of the L m -continuity of p{- ,t) (1 < m < oo) ([12]) it is easy 
to see that 



limsup( sup / (nHp(- , h, \)(x) — 2 1 x) 2 p(x, h, \)dx 

AJ.0 s<h<t Jsupp g' 

< sup (nHp(- ,h)(x) — 2~ 1 x) 2 p(x,h)dx . 

s<h<t J 



□ 



2.5. From now on we return to the context of bounded variables X . If the distribution of 
X is Lebesgue absolutely continuous and has density p which is L 3 , then \ J{X) = irHp(X) 
where J(X) is the conjugate variable (a.k.a. free Brownian gradient, a.k.a. noncommutative 
Hilbert transform) (see [13]) and 

*W = t(J(X) 2 ) = 4ir 2 J (Hp(x)) 2 p(x)dx = tr 2 j p 3 (x)dx 

is the free Fisher information (see [11], [13] up to different normalizations). The quantity 
occurring in Lemma 2.4, 

I(X) = aJ (ttHp(x) - 2' 1 xf p(x)dx = t((J(X) - X) 2 ) = $(X) - 2 + r(X 2 ) , 

is a generalization of the free Fisher information for Ornstein-Uhlenbeck processes (see [4]). 
The inequality in Lemma 2.4 can also be written 

4(t- S y 2 W(X(s),X(t)) 2 < sup I(X(h)) . (4) 

s<h<t 

2.6 The free entropy 

The free entropy of X with distribution /i = fix is 

X (X) = J J log \s - t\dfi(s)dfJL(t) + f + | log(27r) 
(see [11], [13] up to different constants) and we have 

X (aX) = x(X) + \og\a\ and lim^M* + e^S) - = 2- 1 $(X) . 
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The quantity we shall use in estimating the distance to the semicircle distribution is a 
modified free entropy adapted to the free Ornstein-Uhlenbeck process ([4]): 

s(x) = - X (x) + X (S) + H^ 2 ) - \ 

= \t{X 2 )- JJdfi(s)dfi(t)\og\s-t\- 3 
We have limS(X(t)) = and 
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t— >oo 



jfW)) = |Q-x(X + (e*-l)^) + ie-V(X 2 ) + |(l-e-^ 

= 2- 1 (l - e%{X + (e* - 1)55) - e-V(X 2 ) + e"') 
= 2~ 1 (1 - $(X(t)) - r(X(tf) + 1) = -2- l I(X). 

Note also that in [4] using the logarithmic Sobolev inequality for \ (Prop. 7.9 in [13]), it 
is shown that 

t(X(t)) < 2-'l(X(t)) (5) 
which is a logarithmic Sobolev inequality for the Ornstein-Uhlenbeck process. 
Lemma 2.7 Assume X, Y are bounded and self-adjoint, then if t > we have 
Mm sup {el" 1 \W (Y, X (t + s)) -W(Y,X(t))\ < 2' 1 (I (X '(t)))* 

Proof. By the triangle inequality for W , we have 

\W(Y,X(t + e))-W(Y,X(t))\ < W(X(t),X(t + e)) . 

The lemma then follows from (4) and the continuity of I{X{h)) (h > 0), which is a conse- 
quence of the continuity of $(X(h)) (Corollary 2 in [12]). □ 

We now have all ingredients to get an estimate for W(X, S) which is similar in the free 
context to an inequality of Talagrand in the classical setting ([7], [10]). 

Theorem 2.8 W(X, S) 2 < t(X) . 
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Proof. Because of the semicircular maximum for x we have x(X) < x(S)+2 1 \og(r(X 2 )) 
so that t{X) > 2-\t(X 2 ) - (1 + logr(X 2 ))) > 0. Thus it will suffice to prove that 
W(X,S) - < o. 

By Lemma 2.7, the inequality (5) and the formula for the derivative of T,(X(t)) , we have 
for t > 0, 

liminf e-\W(X(t + e),S) - (t(X(t + e)))* - W(X(t), S) + (E(X(t)))*) 

£— +0 

> -2-\l(X(t))^+2- 2 I(X(t))(t(X(t)))^ 
>2-\l(X(t))^+2- 2+1 I(X(tmi(X(t)))^ = o. 
Hence W(X(t), S) — (S(X(t)))5 is an increasing function and we have 

]im(W(X(t),S)-(£(X(t)))*) = 

because of the semicircular maximum and lower semicontinuity of x- It follows that 

W(X(t),S)-(t(X(t)))-* < 

if t > 0. To get the inequality for t — 0, remark that X(t) is norm-continuous so that 
W(X(t), S) tends to VF(X, S) as t — > 0. On the other hand, by lower semicontinuity of x. 



Iiminf(-(E(X(t)))a) > -(E(X))s 



□ 



2.9 Remark. 

Because of the coincidence of the free and classical Wasserstein distance for single self-adjoint 
variables, the preceding theorem can also be written in terms of probability measures for 
the classical distance. Let /x be a compactly supported probability measure on R and a a 
(0,l)-semicircle distribution. Then we have 

(W(fx, a)) 2 < \$ x 2 dfi(x) -II dn(s)dfi(t) log \s - t\ - f . 
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